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ABSTRACT 

We examined the relation between reading prosody and reading comprehen- 
sion, using a systematic review and meta-analysis to estimate the strength of the 
relation and to understand whether the strength of the relation varies by 
prosody feature (adult-like contour, FO sentence-final declination, grammatical 
pauses, ungrammatical pauses, prosody scale), students’ developmental phase 
of reading skill as examined by grade level, and orthographic depth. A total of 35 
studies (N = 9,349; Grades 1-9, 8 languages) met inclusion criteria. Overall 
a moderate relation (.51) was found between reading prosody and reading 
comprehension. Furthermore, the strength varied by prosody feature such 
that the relation was stronger for prosody rating scale than for pitch indicators 
such as adult-like contour and FO sentence-final declination. However, grade 
and orthographic depth were not significant moderators. These results suggest 
that the relation between reading prosody and reading comprehension is not 
unitary and should consider specific aspects of reading prosody. 


The ability to read connected text with speed, accuracy, and expression (reading fluency) is an 
important skill for reading comprehension (Kim, 2015, 2020a, 2020b; Kuhn, Schwanenflugel, & 
Meisinger, 2010; National Institute of Child Health and Human Development [NICHD], 2000; 
Pikulski & Chard, 2005; Wolf & Katzir-Cohen, 2001). Students who struggle with fluent reading are 
often found to have difficulty with reading comprehension (Sabatini, Wang, & O'Reilly, 2018). 
Although the definition of reading fluency includes three aspects, accuracy, speed, and expression 
(i.e., prosody; Hudson, Lane, & Pullen, 2005; Kuhn et al., 2010), the majority of reading fluency 
research and tools used for classroom assessment, such as the Dynamic Indicators of Basic Early 
Literacy Skills (DIBELS) Oral Reading Fluency (Good & Kaminski, 2002), do not include expression or 
reading prosody (Dowhower, 1991; Kuhn et al., 2010), and only measure speed and accuracy (Baker, 
Park, & Baker, 2012; Daane, Campbell, Grigg, Goodman, & Oranje, 2005; Fuchs, Fuchs, Hosp, & 
Jenkins, 2001; Jenkins, Fuchs, van den Broek, Espin, & Deno, 2003; Kim, 2015; Kim, Petscher, 
Schatschneider, & Foorman, 2010; Kim & Wagner, 2015; Riedel, 2007; Roehrig, Petscher, Nettles, 
Hudson, & Torgesen, 2008; Silverman, Speece, Harring, & Ritchey, 2013; Tilstra, McMaster, van den 
Broek, Kendeou, & Rapp, 2009). Although extant work indicates a relation between reading prosody 
and reading comprehension, we do not have a solid understanding of this relation. The current work 
explores this relation using a systematic review and meta-analysis to estimate the magnitude and to 
examine potential moderators such as prosody feature (e.g., Benjamin, 2012; Kim, Quinn, & Petscher, 
2020; Schwanenflugel, Hamilton, Kuhn, Wisenbaker, & Stahl, 2004), developmental phase of reading 
(Calet, Gutiérrez-Palma, & Defior, 2015; Fernandes, Querido, Verhaeghe, & Araujo, 2018; Miller & 
Schwanenflugel, 2008; Rasinski, Rikli, & Johnston, 2009; Veenendaal, Groen, & Verhoeven, 2016), and 
orthographic depth (Hussien, 2014; Veenendaal et al., 2016). 
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Reading prosody and reading comprehension 


Reading prosody is prosodic rendering when reading connected text (i-e., not lexical prosody or 
prosodic sensitivity, e.g., Kim & Petscher, 2016; Schwanenflugel & Benjamin, 2017; Wood, 2006). 
There are two alternative views on the relation between reading prosody and reading comprehension. 
One view is that reading prosody, as a part of the reading fluency construct, plays a role in reading 
comprehension by facilitating syntactic (i.e., chunking constituents) and semantic (i.e., deriving 
meaning from read words and phrases) processing (Dowhower, 1991; Kuhn et al., 2010; Schreiber, 
1980, 1987). That is, reading prosody allows one “to hold an auditory sequence in working memory” 
and “assists in maintaining an utterance in working memory until a more complete semantic analysis 
can be carried out” (Kuhn et al., 2010, p. 237). According to an alternative perspective, reading 
prosody is an indicator or an outcome of reading comprehension. That is, prosodic reading — reading 
with appropriate intonation, grouping of words into meaningful units, and pausing in appropriate 
places - relies on at least some level of text comprehension, such as knowledge of syntactic structures, 
especially since written text typically does not contain graphic cues to mark constituents (e.g., 
punctuation to denote Schreiber, 1987, 1991), and, therefore, is a product of comprehension 
(Davies, 1994; Dowhower, 1991; Ravid & Mashraki, 2007). 

Regardless of different views, however, reading prosody is expected to be related to reading 
comprehension. Indeed, extant studies have found a positive relation between prosodic reading and 
reading comprehension, and the relation varies in magnitude. For example, for monolingual English 
speakers, reading prosody and reading comprehension had a weak relation in second and third grade 
when reading prosody was measured directly by spectrographic analysis (r =.11-.30; Schwanenflugel 
et al., 2004), a moderate relation in fourth grade using a holistic rating scale where a single score is 
assigned after evaluating multiple aspects (e.g., NAEP fluency scale; r = .59; Sabatini et al., 2018), and 
a strong relation in ninth grade using an analytic rating scale where a score is assigned to each of 
different aspects (e.g., Multidimensional Fluency Scale; r = .71; Paige, Rasinski, Magpuri-Lavell, & 
Smith, 2014). Somewhat similar magnitudes have been found with monolingual fourth graders of 
Dutch (r = .41; Veenendaal et al., 2016) and Turkish (r = .44; Yildiz & Cetinkaya, 2017) with analytic 
rating scales. Variation in the magnitudes of the relation between reading prosody and reading 
comprehension may be explained by the feature of prosody measured, reading development, and 
orthographic depth. 


Reading prosody features 


Prosody is characterized by intonation, stress, duration, and pausing (Couper-Kuhlen, 1986; Kuhn 
et al., 2010; Schreiber, 1980, 1987). Prosody conveys paralinguistic information that supports 
a listener’s comprehension of a speaker’s intended meaning and emotion (Wilson & Wharton, 
2006). For example, in English, a Wh-question declines in pitch and volume at the end of the sentence, 
whereas a yes/no question increases in pitch and volume at the end of a sentence. Reading prosody has 
been primarily measured in two ways, by rating scale or spectrographic analysis. Rating scales capture 
the listener’s perceptions of reading prosody such as expressiveness, phrasing, smoothness, pace, and 
deviations from text on a scale (e.g., 1 to 4). On the other hand, spectrographic analysis is done 
through analysis of a spectrogram, a visualization of sound waves (Denes & Pinson, 1993), by precise 
measurements of sound waves to directly measure pause structure in milliseconds such as pause 
duration and frequency (e.g., grammatical and ungrammatical pauses), and pitch changes such as 
intonation contour (i.e., vocalic nuclei) and FO changes (e.g., FO sentence-final declination) in Hertz 
(number of sound wave cycles per second; Crystal, 2011). Although both approaches measure reading 
prosody, they measure different features or aspects in different ways, and, thus, it is unclear whether 
the features and measures are comparable or whether the relation of reading prosody and reading 
comprehension varies as a function of the feature of reading prosody. 
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Numerous reading prosody rating scales have been introduced since the 1980s (e.g., Allington, 1983; 
NAEP oral reading fluency scale by Daane et al., 2005; Six Dimensions Fluency Rubric by Pinnell & 
Fountas, 2010; the Multidimensional Fluency Scale by Zutell & Rasinski, 1991; prosodic map by Ravid & 
Mashraki, 2007), but two are most commonly used. One is the NAEP oral reading fluency scale (Daane 
et al., 2005) which uses a four-point scale to measure phrasing, deviations from text, syntax, and expression 
holistically together. The other is the Multidimensional Fluency Scale (Rasinski, 2004, adapted from; Zutell 
& Rasinski, 1991) which uses a 16-point analytic scale where each of the four categories (expression and 
volume, phrasing, smoothness, and pace) has its own 4-point rating scale that is then added up to get the 
total score out of 16. Although this is an analytic scale, typically studies only report a total prosody score. 
These scales have been adapted for speakers of languages other than English such as Spanish (Gonzalez- 
Trujillo, Calet, Defior, & Gutiérrez-Palma, 2014) and Turkish (Yildiz et al., 2014), and were found to be 
reliable and valid (Daane et al., 2005; Moser, Sudweeks, Morrison, & Wilcox, 2014; Rasinski, 2004; Smith & 
Paige, 2019). A recent study with English-speaking students in Grades 1 to 3 showed that the four aspects 
of the Multidimensional Fluency Scoring Guide loaded onto a single latent variable together with pause 
structures measured by spectrographic analysis (i.e., ungrammatical pause duration and frequency; Kim, 
Quinn, et al., 2020). In contrast, pitch or intonation features such as FO sentence-final declination and 
intonation contour formed a separate latent variable (Kim, Quinn, et al., 2020). These results indicate that 
the Multidimensional Fluency Scale likely captures the decoding-related prosody aspect similar to previous 
studies that reported a strong relation of pause structure reading prosody to decoding skills (Benjamin & 
Schwanenflugel, 2010; Binder et al., 2013; Kim, Quinn, et al. 2020; Miller & Schwanenflugel, 2008; 
Schwanenflugel et al., 2004). 

Reading prosody has also been examined using spectrographic analysis. Because spectrographic 
analysis allows for the exact duration of a pause to be measured in milliseconds and for change in 
fundamental frequency (FO) to be measured in Hertz, studies using spectrographic analysis measured 
reading prosody features such as pause structure and pitch variation. Schwanenflugel and colleagues 
(Benjamin & Schwanenflugel, 2010; Miller & Schwanenflugel, 2006, 2008; Schwanenflugel et al., 2004) 
captured duration and frequency of pauses, adult-like FO (pitch) contour (through vocalic nucleic 
matching), and FO sentence-final declination (difference in Hertz from one wave peak to the next). 
Using data from English-monolingual second graders, they found weak to moderate magnitudes for 
the relations of various aspects of reading prosody to reading comprehension, ranging from r = .03 for 
ungrammatical pauses to r = .31 for FO sentence-final declination (Schwanenflugel et al., 2004). 
However, another study with English-monolingual second graders found stronger magnitudes overall, 
ranging from rs = .21-.36 for adult-like contour to r = .59 for grammatical pauses (Benjamin & 
Schwanenflugel, 2010). Clearly, not only are there inconsistent findings across the features of reading 
prosody, but also across studies examining the same features. 


Reading development 


Reading fluency is influenced by development because decoding skill - the ability to sound out a real or 
nonsense words based on grapheme-phoneme correspondence knowledge - constrains the ability to 
read with speed, accuracy, and prosody during connected text reading (Kim, Quinn, et al., 2020; Kuhn 
et al., 2010; Schwanenflugel et al., 2004). According to the automaticity theory (LaBerge & Samuels, 
1974), a reader needs to have proficient lower order skills (i.e., decoding) so that working memory 
resources are available to chunk text together (ie., morphosyntax; Schreiber, 1987) and construct 
meaning (semantic processing) to support prosodic reading. A recent study showed that word reading 
strongly predicted reading prosody for English-speaking children in Grades 1 to 3 (Kim, Quinn, et al., 
2020). If word reading is the primary driver of reading prosody at least in the beginning phase of reading 
development, then reading prosody captures word reading skills to a large extent in the beginning phase. 
In the later phase, however, reading prosody is expected to facilitate semantic processing (Kuhn et al., 
2010) or is a function of semantic processing (Davies, 1994; Ravid & Mashraki, 2007) as the constraining 
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role of decoding is lifted with reading development (Adlof, Catts, & Little, 2006; Florit & Cain, 2011; 
Kim, 2015). 

If reading prosody as a construct captures both decoding and semantic processes, then the 
relation between reading prosody and reading comprehension might not change largely as 
a function of reading development. However, if specific measures of reading prosody capture 
different aspects/features of reading prosody - whether different reading prosody aspects primarily 
capture decoding or semantic processing - then the relation between reading prosody and reading 
comprehension would differ as a function of reading development for different features. For reading 
prosody features that primarily capture decoding skills (e.g., pause structure such as inappropriate 
or ungrammatical pauses; Arcand et al., 2014; Binder et al., 2013), the relation between reading 
prosody and reading comprehension will be stronger in the beginning phase of development and 
will become weaker at a more advanced phase. In contrast, for reading prosody features that 
primarily capture semantic processing (e.g., pitch indicators such as child-adult pitch [FO] match, 
or FO sentence-final declination; Binder et al., 2013; Schwanenflugel et al., 2004), the relation will be 
weaker in the beginning phase of reading development and will become stronger with reading 
development. 

A small number of studies have explored whether the relation between reading prosody and 
reading comprehension varies over developmental phases. The limited longitudinal studies yielded 
inconsistent findings even when reading prosody was measured by the same approach, a rating scale. 
A study with English-monolingual second graders showed similar strengths in magnitudes in the fall 
and winter, r = .77 and r = .76, respectively, and a weaker magnitude in the spring, r = .59 (Lai, 
Benjamin, Schwanenflugel, & Kuhn, 2014). A study in European Portuguese found that the relation 
was moderate for students in Grade 2 (r = .38) and Grade 3 (r = .31), whereas the relation was weaker 
for students in upper elementary grades (r = .18 in Grade 4 and r = .06 in Grade 5; Fernandes et al., 
2018). A longitudinal study with Dutch-monolinguals reported r = .39 in fourth grade, r = .39 in fifth 
grade, and r = .60 in sixth grade (Veenendaal et al., 2016). 


Orthographic depth 


At the center of the theoretical accounts of the relation between reading prosody and reading compre- 
hension is semantic processing (see above; Chafe, 1994; Davies, 1994; Koriat, Greenberg, & Kreiner, 
2002; Kuhn et al., 2010), but semantic processing, including reading prosody, is constrained by word 
reading skill (Kim, Quinn, et al., 2020; Kuhn et al., 2010). Therefore, reading prosody is a function of and 
captures both word reading and semantic processing. Consequently, two hypotheses with regard to 
orthographic depth are reasonable. First, the relation between reading prosody and reading comprehen- 
sion is expected to vary by orthographic depth as a function of reading prosody feature and reading 
development. The rate at which a reader develops decoding varies as a function of the orthographic 
depth of the language - word reading acquisition occurs at a faster rate in shallow orthographies (Aro & 
Wimmer, 2003; Ellis et al., 2004; Katz & Frost, 1992; Seymour, Aro, & Erskine, 2003). Therefore, word 
reading would play a constraining role for a longer time in deep orthographies (e.g., kindergarten to 
Grade 2 in English) than in shallow orthographies (e.g., Grade 1). This was supported in a meta-analysis 
such that the relation between word reading fluency and reading comprehension was .79 for children 
(Grades 3-5) learning to read in English, a deep orthography, whereas it was .48 (Grades 3-5) for 
children learning to read in more shallow orthographies (Florit & Cain, 2011). Then, for languages with 
transparent orthographies, reading prosody features that primarily capture decoding skills (e.g., ungram- 
matical pausing) would show a short-lived strong relation between decoding-related reading prosody 
and reading comprehension in comparison to deep orthographies (i.e., relation becomes weaker at an 
earlier grade). Consequently, reading prosody features that capture intonation modulation (adult-like 
contour, FO sentence-final declination) - drawing from semantic processing skills - would have 
a stronger relation with reading comprehension at an earlier grade in shallow orthographies than in 
deep orthographies. 
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The second possibility is that the overall relation between reading prosody and reading comprehension 
is stronger in deep orthographies. In deep orthographies, accurate word reading requires knowledge of 
morphological and morphosyntactic skills to a greater extent than in shallow orthographies because 
spellings of words represent morphemes as well as phonemes (e.g., Joshi, Treiman, Carreker, & Moats, 
2008/2009; McBride-Chang et al., 2005). If accurate word reading requires morphological and morpho- 
syntactic processing to a greater extent in deep orthographies, and word reading constrains reading 
prosody, then reading prosody in deep orthographies would reflect morphosyntactic processing to 
a greater extent. In this case, the relation between reading prosody and reading comprehension would 
be stronger in deep orthographies than in shallow orthographies, given the role of morphological proces- 
sing in reading comprehension (e.g., Carlisle, 2000; Frost, 2005; Kieffer, Biancarosa, & Mancilla-Martinez, 
2013; Kieffer & Box, 2013; Kim, Guo, Liu, Peng, & Yang, 2020). This certainly does not deny that reading 
prosody captures sentence-level morphosyntactic and higher order semantic processing (e.g., commu- 
nicative intent) — this is not expected to differ across orthographic depth. However, if morphological and 
morphosyntactic processing is captured in reading prosody to a greater extent in deep orthographies by 
way of word reading, then it seems plausible that the overall relation between reading prosody and reading 
comprehension may be stronger in deep orthographies. 

Extant, although limited, research, however, does not suggest a clear picture about a differential 
relation by orthographic depth even for studies with students at similar grades. For example, two 
studies, Fernandes et al. (2018) and Calet et al. (2015), worked with fourth graders in shallow 
orthographies (European Portuguese and Spanish, respectively), using adapted versions of the 
Multidimensional Fluency Scoring Guide (Rasinski, 2004). In European Portuguese, a less shallow 
language than Spanish, reading prosody and reading comprehension had a weak relation, r = .18, 
whereas in Spanish, a more shallow and highly consistent orthography, they had a moderate relation, 
r = 47. Studies with fifth graders learning to read in English that also used the Multidimensional 
Fluency Scoring Guide found moderate to strong relations (.49 < rs < .73; Klauda & Guthrie, 2008; 
Mokhtari & Thompson, 2006; Rasinski et al., 2009; Sargent, 2002). 


The current study 


Reading prosody has long been considered as an important aspect of the text reading fluency construct 
(Kuhn et al., 2010; NICHD, 2000). Studies examined various aspects of reading prosody using different 
measurement approaches and showed varying relations between reading prosody and reading compre- 
hension. To expand our understanding of the relation between reading prosody and reading compre- 
hension, we addressed the following research questions. First, what is the average magnitude of the 
relation between reading prosody and reading comprehension? Second, does the strength of the relation 
differ by prosody feature with and without controlling for grade and orthographic depth? Third, does 
the relation vary as a function of reading development (using grade as a proxy) and orthographic depth? 

We hypothesized that reading prosody and reading comprehension would be related, but the 
magnitude would differ by prosody feature. We posited that the relation may also vary as a function of 
reading development, but it would depend on aspects or features of reading prosody. We expected that 
orthographic depth would also moderate the relation, but it would depend on overall relation versus 
specific prosody features. For the overall relation, a stronger relation was expected in deep orthogra- 
phies than in shallow orthographies, whereas for the specific prosody features, the moderation was 
expected to differ by developmental phase (see above for specifics). It should be noted, however, that 
we could not examine the latter (moderation by specific prosody features) due to a limited number of 
effect sizes available (see below). 
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Method 
Literature search 


The following databases were searched through ProQuest: Educational Resources Information Center 
(ERIC), APA PsycInfo, Sociological Abstracts, Linguistics and Language Behavior Abstracts (LLBA), 
Dissertations & Theses Global, and ProQuest Dissertations & Theses A&I. No study was excluded 
based on peer review or publication status. In addition to keywords, names of authors who created 
prosody scales and the titles of the prosody scales (e.g., Rasinski, 2004) were also included. Although 
this is not conventional, this approach was taken because of the inconsistent use of the term “prosody” 
(e.g., some texts used “expression” or “reading fluency” instead). The following search terms were 
paired with reading comprehension: prosody, Multidimensional AND Rasinski, NAEP scale AND oral 
reading fluency, spectrograph, Fountas AND Pinnell AND oral reading fluency, Tindal and Marston 
(1996), and Allington (1983). 

The following were inclusion criteria in the current systematic review and meta-analysis: (1) both 
reading prosody (of a connected text of more than one sentence) and reading comprehension were 
measured; (2) reading prosody was measured through either spectrographic analysis or a rating scale; 
(3) the primary participants (over 50% of the sample) were measured in their first language; (4) the 
primary participants had no severe disabilities such as intellectual disabilities and severe behavioral 
disabilities (this criterion affected very few studies, and studies with developmental language disorder 
or learning disabilities were included); (5) the study was published between 2000 and 2019; (6) the 
study was published in English; and (7) data were not from an experimental group after a reading 
prosody or reading comprehension intervention — for studies where an intervention was conducted, 
only the pretest and control group data were included. 

Search results were uploaded to an online meta-analysis review tool that allows for two researchers 
to conduct a double-blind review (i.e., Covidence). The first author and a research assistant reviewed 
abstracts of results. There was full agreement on 90% of the articles. For the remaining 10%, a mutual 
decision was made after discussion. In addition, the following journals were digitally hand searched 
with the terms prosody and reading comprehension: Reading Research Quarterly, Journal of Research 
in Reading, Reading and Writing Quarterly, and Reading and Writing. Regarding the studies from the 
journals, there was 97% agreement between the primary and secondary authors - only two studies had 
discrepancies in the inclusion decision, and they were resolved in discussion. Finally, the reference 
sections of all related articles were examined and citation chained. 

As can be seen in the PRISMA flow diagram (Figure 1), the systematic review and meta-analysis 
was conducted over several stages with two double-blind review screenings (Borenstein, Hedges, 
Higgins, & Rothstein, 2011). The first screening examined titles, keywords, and abstracts. Abstracts 
were screened for terms that might suggest that reading prosody and reading comprehension were 
measured (e.g., reading comprehension, prosody, oral reading fluency, reading outcomes, expression). 
Studies that passed the first screening were then reviewed for a second screening as full articles to 
determine whether they fit the inclusion criteria (i-e., qualitative synthesis, studies that collected the 
data on both reading prosody and reading comprehension). Studies that met inclusion criteria but did 
not include correlations between reading prosody and reading comprehension were handled in two 
ways so that they could be included in the quantitative synthesis. If the study included raw data such as 
students’ reading prosody and reading comprehension scores, the relation was hand calculated. If no 
such information was given, then the author of the study was contacted via e-mail if their e-mail 
address could be found online through online search engines - three studies were collected this way 
(Hussien, 2014; Jefferson, Grant, & Sander, 2017; Schwanenflugel et al., 2004). The final set of studies 
that met our inclusion criteria with available data included 35 of the studies (52 unique samples, 98 
effect sizes; N = 9,349) with 28 journal articles, 6 dissertations (master’s and doctoral), and 1 paper 
presentation. There were no duplicated samples between the two (articles and dissertations) unless 
they were from different time points (e.g., longitudinal). The final included studies that were used for 
the quantitative synthesis are the final found in Table 1. 


Records identified through 
database searching 
(n = 4,301) 


Records after duplicates 
removed 
(n = 3,486) 


Full-text articles assessed 
for eligibility 
(n = 208) 


Database search studies for 
quantitative synthesis 
(n = 28) 


Studies included in 
quantitative synthesis 
(n= 35) 
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Records excluded 
(irrelevant abstract, title, 
keywords) 

(n = 3,278) 


Records excluded (n = 180) 
Most common reasons: 


(1) Did not assess reading 
prosody or reading 
comprehension 
(n=119) 

(2) No Pearson’s 
correlation available 
(n= 24) 


Studies identified through 
journal search 
(n=7) 


Figure 1. PRISMA flow diagram depicting the literature search process. 


Coding procedures 


All studies that met the inclusion criteria were coded for the following aspects: sample size, effect size 
(Pearson’s r), participant grade (proxy for reading development), language, measurement method (i.e., 
rating scale or spectrograph), and feature of prosody measured (e.g., adult-like contour). All studies 
were coded by both the first author and the third author; there was 100% agreement. Language was 
also coded as a dichotomous variable (0 = shallow, 1 = deep). Arabic, Hebrew, English, and French 
were considered as opaque (deep) orthographies; Dutch, Turkish, Spanish, and European Portuguese 
were considered as shallow (Ellis et al., 2004; Seymour et al., 2003). Grade was examined as 
a continuous variable. Using the following equations, Pearson’s r was converted to Fischer’s z and 
variance was calculated from the sample size (Borenstein et al., 2011). 
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Table 1. Studies included in meta-analysis. 


Study N Pearson’sr Grade Language Prosody method 
Arcand et al. (2014) 261 .03-.61 2 French (CAN) GP, UP 
Basaran (2013) 90 ~—-.10-.85 4 Turkish (TUR) Scale 
Benjamin (2012)? - Study 1 90 _—.43-.53 2. English (USA) Scale 
Benjamin and Schwanenflugel (2010) 90 _—.21-.70 2 English (USA) ALC, FO, GP, UP 
Benjamin et al. (2013) — Study 2 60 46 3 English (USA) Scale 
Brown, Mohr, Wilcox, and Barrett (2018) 25 51,58 3 English (USA) Scale 
Calet et al. (2015) 50 ~—sti« 2 Spanish (ESP) Scale 
48 7 4 Spanish (ESP) Scale 
Dawson (2015)? 113 43 6, 7,8 African American English (USA) Scale 
Evanchan (2015)? 22.29 2 English (USA) Scale 
Fernandes et al. (2018)* 81 31,.38 2-3 European Portuguese (POR) Scale 
76 ~=—.06-.18 4-5 European Portuguese (POR) Scale 
Gonzalez-Trujillo et al. (2014) 74 ~«61 2 Spanish (ESP) Scale 
48 49 4 Spanish (ESP) Scale 
Groen, Veenendaal, and Verhoeven (2018) 63 = .52 3 Dutch (NLD) Scale 
Hammer (2003)? 13 .06,.44 3 English (USA) Scale 
Hussien (2014) 44833 6 Arabic (EGY) Scale 
Jefferson et al. (2017) 83 —-.33-.44 3 English (USA) Scale 
30 = .19-.27 3 English (USA) Scale 
Kariuki and Baxter (2011)? 10 88 2 English (USA) Scale 
Klauda and Guthrie (2008) 145 —.67,.68 5 English (USA) Scale 
Lai et al. (2014)* 154 = .59-.77 2 English (USA) Scale 
Marrone (2014)° 6 0 1 English (USA) Scale 
May (2014)° 68  .04-.54 5 English (USA) ALC, FO, GP, UP 
Miller and Schwanenflugel (2008)* 92 ~=.24-.56 1-2 English (USA) ALC, UP 
Mokhtari and Thompson (2006) 3200 A273 5 English (USA) Scale 
Paige et al. (2014) 108 «71 9 English (USA) Scale 
Rasinski et al. (2017) 37 — ss 12.28 3 English (USA) Scale 
Rasinski et al. (2009) 391 63 3 English (USA) Scale 
421 66 5 English (USA) Scale 
392 57 7 English (USA) Scale 
Ravid and Mashraki (2007) 51 51 4 Hebrew (ISR) Scale (map) 
Sabatini et al. (2018) 1714 59 4 English (USA) Scale 
Sargent (2002) 52. .22-.57 5 English (USA) Scale 
Schwanenflugel et al. (2004) 120 ~=.11-.29 3 English (USA) ALC, FO, GP, UP 
Taylor, Meisinger, and Floyd (2013) 72. ~48 3 English (USA) Scale 
Tortorelli (2018) 2,191 09 2 English (USA) Scale 
Veenendaal et al. (2016)* 99 ~—.39,.39,.60 4-6 Dutch (NLD) Scale 
Yildirim, Rasinski, and Kaya (2019) 100 ~—«.11 4 Turkish (TUR) Scale 
100 48 5 Turkish (TUR) Scale 
100 49 6 Turkish (TUR) Scale 
100 46 7 Turkish (TUR) Scale 
100 50 8 Turkish (TUR) Scale 
Yildiz and Cetinkaya (2017) 132 «44 4 Turkish (TUR) Scale 
Yildiz et al. (2014) 119 45 5 Turkish (TUR) Scale 
Overall (35 studies, 52 samples) 9349 51 1-9 8 languages 5 methods/features 


Studies marked by asterisk (*) were longitudinal; if underwent attrition, lowest sample size was reported above. Total meta-analysis 
sample (N = 9,349) counts longitudinal samples measured at different grade levels as unique. If more than two effect sizes were 
found, a range is reported. African American English is included with American English. Studies marked by superscript (D) are 
dissertations (master’s or doctoral). Studies marked by superscript (P) are paper presentations. Hammer (2003) uses the terms “pre- 
test” and “post-test” but there was no instruction between assessment; one prosody assessment was a cold read and the other 
gave students time to practice first. Benjamin (2012) and Benjamin et al. (2013) are the same analysis, two different studies within 
each publication; due to the way data were presented, the data from Study 1 were coded from Benjamin (2012) and the data for 
Study 2 were coded from Benjamin et al. (2013). Benjamin (2012) analyzed data from Benjamin and Schwanenflugel (2010) with 
a different tool, thus, they were treated as the same sample. ALC = Adult-like contour, FO = FO sentence-final declination, GP = 
Grammatical pausing, UP = Ungrammatical pausing. Country abbreviations are as follows: CAN = Canada, POR = Portugal, EGY = 
Egypt, ESP = Spain, ISR = Israel, NLD = Netherlands, TUR = Turkey, USA = United States of America. 


Data analysis 


Data were uploaded into R (R Core Team, 2013) and the overall effect size, confidence intervals, and 
meta-regression were conducted with the robumeta package (RVE; Hedges, Tipton, & Johnson, 2010). 
Robumeta accounts for studies with small sample size in R using the robust variation estimator to 
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apply appropriate weighting (Tipton, 2015). Robumeta calculates effect sizes using Fischer’s z and 
variance to weight effect size statistics by sample size and then outputs an estimated effect size for 
interpretable results. Due to the sample variation, to account for samples who are members of a greater 
population and to accurately weight effect sizes, random effects were used instead of fixed effects (Kreft 
& de Leeuw, 1998; Viechtbauer, 2005). The statistics I? and Q show heterogeneity and whether analysis 
for a moderation effect is appropriate (Higgins & Thompson, 2002). The I’ statistic revealed that 
approximately 93.12% of the total observed variance was due to differences between the studies rather 
than within-study sampling error. Given the significant heterogeneity, we conducted moderator 
analysis to identify the source of the between-study variation. The included moderators were prosody 
features (e.g., scale, adult-like contour), grade (as a proxy for reading development), and orthographic 
depth (0 = shallow, 1 = deep). Robumeta tests whether there is a statistical difference between the 
overall effect sizes of the groups by moderator and gives an interpretable p-value (p < .05 was 
considered as significant). 

The following analytic procedures were carried out to address each research question. For the first 
research question, an average effect size (k = 98) was estimated. If multiple correlations were available 
from the same sample and time period due to multiple measures of reading prosody and reading 
comprehension, one average effect size was calculated in robumeta for each group. For the second 
research question, meta-regression was used with prosody features as predictors (a series of dichot- 
omous variables) with and without controlling for grade and orthographic depth. For the third 
research question, meta-regression was fitted including grade and orthographic depth. Note that the 
analysis for the third question could not be conducted by reading prosody features (e.g., rating scale, 
FO sentence-final declination, ungrammatical pauses) due to insufficient number of effect sizes per 
prosody feature (particularly for those that examined intonation modulation using spectrographic 
analysis such as FO sentence-final declination). 


Results 


Question 1: what is the average magnitude of the relation between reading prosody and 
reading comprehension? 


A final sample of 35 studies (52 unique samples, N = 9,349) with 98 effect sizes’ was used in the 
analysis (see Table 1). The overall average correlation between reading prosody and reading compre- 
hension was .51 (95% CI = [0.44, 0.57]; Table 2), and there was large variation in correlations, ranging 
from r = 0 to r = .88 (Figure 2). 


Question 2: does the strength of the relation differ by prosody feature with and without 
controlling for grade and orthographic depth? 


The strength of the relation between reading prosody and reading comprehension differed by reading 
prosody aspects, ranging from .31 (grammatical pauses) to .53 (rating scale; see Table 2). However, the 


Table 2. The relations of various prosody measures with reading comprehension. 


Feature b SE CI.LB CI.UB k Levels P Grades 
Overall 0.51 0.03 0.44 0.57 98 52 93.12 1-9 
Adult-like contour 0.32 0.03 0.24 0.40 8 6 0 2-5 
FO declination 0.34 0.08 0.09 0.60 6 4 35.00 2-5 
Grammatical pauses 0.31 0.14 —0.07 0.69 8 5 89.08 2-5 
Ungrammatical pauses 0.38 0.09 0.16 0.60 10 7 82.76 1-5 
Prosody rating scale 0.53 0.04 0.46 0.60 66 46 93.58 1-9 


The features were tested in separate meta-regression models. The / statistic shows large heterogeneity across all estimated effect 
sizes except adult-like contour. Levels = number of unique samples. CI.LB = lower bound of 95% confidence interval; Cl.UB = upper 
bound of 95% confidence interval. 
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Figure 2. Forest plot of all effect sizes exhibiting the relation between reading prosody and reading comprehension. 


only statistical difference in magnitudes was between adult-like contour (r = .33) and prosody rating 
scale (r = .53, p= .02) such that the relation between reading prosody and reading comprehension is 
stronger when reading prosody is measured by rating scale than by adult-like contour. 

The relations remained essentially the same when grade and orthographic depth were added as 
covariates in the model (Table 3). The reference group in this model is rating scale (r = .41). Adult-like 
contour (r = .16) remained statistically significant (p= .03), indicating that its relation with reading 
comprehension is weaker than that between rating scale (the reference group) and reading 
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Table 3. The relation between reading prosody features and reading comprehension with prosody scale as the 
reference controlling for grade and orthographic depth. 


Features b SE p CI.LB CI.UB 
Intercept 0.41 0.10 <.001 0.19 0.62 
Adult-like contour —0.25 0.09 0.03 —0.47 —0.03 
FO declination —0.27 0.09 0.06 —0.56 0.01 
Grammatical pauses —0.34 0.13 0.06 —0.70 0.02 
Ungrammatical pauses —-0.14 0.12 0.28 —0.43 0.15 
Grade 0.01 0.02 0.58 —0.03 0.05 
Orthographic depth 0.15 0.07 0.04 0.01 0.28 


CI.LB = lower bound of 95% confidence interval; Cl.UB = upper bound of 95% confidence interval. 


comprehension after controlling for grade and orthographic depth. It should be noted that FO 
sentence-final declination (r= .14) and grammatical pauses (r = .07) were just shy of reaching the 
conventional statistical significance (ps = .06). 


Question 3: does this relation vary as a function of reading development (grade) and 
orthographic depth? 


As shown in Table 4, the magnitude of relations did not differ by grade (p = .40) or orthographic depth 
(p = .18). When grade was controlled for, there was still no significant difference between reading 
prosody and reading comprehension (p = .11) in deep orthographies (r = .46) versus in shallow 
orthographies (r = .36). 


Sensitivity analysis 


Robust variation estimation 

Sensitivity analysis was conducted with the metafor package (Viechtbauer, 2010). Metafor yielded 
highly similar results as robumeta for the relation between reading prosody and reading comprehen- 
sion with r = .50. 


Extreme sample size 

To control for two studies that had a relatively large sample sizes compared to the other studies 
(Sabatini et al., 2018: n = 1,714, r = .59; Tortorelli, 2018: m = 2,191), analysis was conducted without 
these studies. Removing these studies did not change the overall magnitude, r = .51, suggesting that the 
large sample size did not affect the results. 


Orthographic depth 

European Portuguese has been generally considered as a shallow orthography (Defior, Martos, & Cary, 
2002); however, some research has identified it as having intermediate depth (Seymour et al., 2003; 
Sucena, Castro, & Seymour, 2009). Therefore, analysis was conducted without the study on European 


Table 4. The relation of reading prosody and reading comprehension with grade and orthographic depth as 


moderators. 

Features b SE p CI.LB CI.UB 
Intercept 0.44 0.09 <.0001 0.26 0.62 
Grade 0.02 0.02 0.40 —0.02 0.06 
Intercept 0.45 0.04 <.0001 0.37 0.54 
Deep orthographies 0.09 0.06 0.18 —0.04 0.21 

Intercept 0.36 0.10 0.00 0.16 0.56 
Deep orthographies 0.10 0.06 0.11 —0.02 0.23 
Grade 0.02 0.02 0.25 —0.02 0.06 


CI.LB = lower bound of 95% confidence interval; Cl.UB = upper bound of 95% confidence interval. 
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Portuguese (Fernandes et al., 2018). This led to no evidence of a moderation effect by orthographic 
depth with or without grade as a control (ps = .60, .40, respectively). 


Publication bias 

Studies with statistically significant findings are often favored by journals for publication, 
which is known as “publication bias” (Sterne, Egger, & Smith, 2001). Effect sizes are expected 
to be evenly distributed around the overall estimated effect size when there is no publication 
bias. Figure 3 shows a funnel plot of the distribution of effect sizes. As can be seen from Figure 
3, the studies are somewhat symmetric, but the studies do not fall in the white shaded area of 
the triangle, suggesting the heterogeneity of the studies. A random-mixed effects meta- 
regression model (weighted regression with multiplicative dispersion with standard error as 
the predictor; Egger, Smith, Schneider, & Minder, 1997; Sterne & Egger, 2005) was run to 
statistically test whether correlations were asymmetrical around the mean. However, it did not 
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Figure 3. Funnel plot of effect sizes from all included studies. 
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reach the conventional significance level for publication bias, z = —1.06, p = .29, suggesting no 
evidence of publication bias. 


Discussion 


In this study, we investigated the relation between reading prosody and reading comprehension using 
a systematic review and meta-analysis. Our final sample consisted of 35 studies (52 unique samples, 98 
effect sizes, N = 9,349), which included five different reading prosody features (rating scale, adult-like 
contour, FO sentence-final declination, grammatical pauses, ungrammatical pauses) and readers from 
Grades 1 to 9 in eight languages with varying levels of orthographic depth (shallow: Turkish, Spanish, 
European Portuguese, Dutch; deep: French, Arabic, Hebrew, English). 

Overall, reading prosody and reading comprehension were moderately related, r = .51. Beyond the 
average magnitude though, there was large variation in the strength of relations. Our hypothesis that this 
variation would be explained by reading prosody features was partially supported. Once grade and 
orthographic depth were accounted for, the relations of reading comprehension with adult-like contour 
were weaker than the relation between reading comprehension and rating scale. In contrast, rating scale 
did not have a stronger relation with reading comprehension than did ungrammatical pauses, gramma- 
tical pauses, and FO sentence-final declination although for grammatical pauses and FO sentence-final 
declination, there was a trend of their weaker relations than rating scale. These results appear to be in line 
with a recent study which showed that rating scale and pause structure prosody features (e.g., ungram- 
matical pauses) loaded onto a single latent variable, whereas the pitch aspect of reading prosody 
(intonation contour and FO sentence-final declination) was related but a dissociable variable (Kim, 
Quinn, et al., 2020). Taken together, these results indicate that reading prosody is a multi-dimensional 
construct and that various measures of reading prosody tap into different aspects or dimensions of 
reading prosody (Kim, Quinn, et al., 2020), and, therefore, their relations to reading comprehension 
differ depending on the aspects. 

It is unclear why the relation of reading prosody and reading comprehension is stronger when 
reading prosody is measured by a rating scale than by pitch or intonation measured by spectro- 
graphic analysis. One explanation is that rating scales capture multiple aspects. For example, the 
Multidimensional Fluency Scoring Guide (Rasinski, 2004) examines four categories, expression 
and volume, phrasing, smoothness, and pace. Although recent studies showed that all of these 
four aspects essentially capture a single construct with (Kim, Quinn, et al., 2020) and without 
pause structure indicators (Benjamin & Schwanenflugel, 2010), evaluating multiple aspects some- 
how might provide a richer picture of reading prosody, which in turn leads to a stronger relation 
with reading comprehension. Alternatively, the results likely reflect a limitation of the extant 
literature. As shown in Table 1, the number of studies using spectrographic analysis (n = 5) was 
extremely limited compared to those that employed rating scales (n = 30). Moreover, even the 
majority of studies using spectrographic analysis was conducted with students in primary grades 
(Grades 1 to 3, with one study in Grade 5), whereas the grade levels of studies that employed 
rating scales ranged from Grade 1 to Grade 9. In other words, the literature base is too skewed to 
represent a full picture about the relation of various reading prosody features. Future studies, 
particularly those that examine reading prosody with spectrographic analysis across the reading 
development phases (particularly with students in upper elementary and secondary schools), are 
warranted. 

We explored whether orthographic depth moderates the relation between reading prosody and 
reading comprehension. We hypothesized that the overall relation between reading prosody and 
reading comprehension might be stronger in deep orthographies because reading prosody in deep 
orthographies likely reflects morphological and morphosyntactic knowledge to a greater extent than it 
does in shallow orthographies. However, our results showed no difference in magnitudes of the 
relations between reading prosody and reading comprehension as a function of orthographic depth 
(Ar= .10; p = .11). It is important to note that we were not able to address this question by prosody 
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features due to no effect sizes from spectrographic analysis on speakers of shallow languages, and thus, 
our analysis reflects the relation of reading prosody as a whole to reading comprehension. Future 
crosslinguistic endeavors are needed, especially on shallow languages (Kuhn et al., 2010) that use 
spectrographic analysis. 

We also explored whether the magnitude of the relation between reading prosody and reading 
comprehension differs by grade (a proxy for reading development phase), and we found that the 
relation did not differ by grade. This is in line with our speculation that reading prosody relates to 
reading comprehension across developmental phases as reading captures both decoding and semantic 
processes. Similar to the moderation question about orthographic depth, however, differential rela- 
tions were not addressed by prosody features due to the limited number of studies as noted above. 


Limitations and Future Research 


There were several limitations of this study. First, it should be noted that our goal in this study was to 
estimate the magnitude of the relation between reading prosody and reading comprehension, not the 
directionality of the relation. Directionality inquiry can be best addressed with longitudinal studies 
and experimental studies. For example, Cypert and Petro (2019) found that university students who 
received a prosodic reading intervention had significantly better reading comprehension than the 
control group on the posttest. Note though that research on the causal role of reading prosody on 
reading comprehension is extremely limited (Ardoin, Morena, Binder, & Foster, 2013) and the present 
meta-analysis does not allow us to draw inferences on practical implications. 

As well, very few studies used spectrographic analysis and all the studies with spectrographic 
analysis were with students between Grades 1-5 in languages with deep orthographies (English, 
French). The lack of studies examining reading prosody using spectrographic analysis in shallow 
orthographies greatly limited the analysis in the present study. Specifically, it would have been ideal to 
address the third research question by reading prosody features. This would have allowed us to 
evaluate whether varying magnitudes as a function of orthographic depth differ by reading prosody 
features. For example, pause structure (e.g., ungrammatical pauses) may have a strong relation with 
reading comprehension for a longer time in deep orthographies than in shallow orthographies because 
of the slower rate of word reading development in deep orthographies (Aro & Wimmer, 2003; Ellis 
et al., 2004; Katz & Frost, 1992; Seymour et al., 2003). On the other hand, the pitch aspect of reading 
prosody that captures semantic processing may relate to reading comprehension at an earlier grade in 
shallow orthographies than in deep orthographies, again due to the differences in how long decoding 
constrains reading processes. 

In the present study, we used grade levels as a proxy for developmental phase of reading. However, 
grade is a rough proxy, and there is variation across education systems in terms of the grade in which 
reading instruction starts. Furthermore, studies were not be grouped by developmental phases as 
others have done (e.g., by grades in similar developmental stages, e.g., Florit & Cain, 2011; or by age, 
e.g., Garcia & Cain, 2014) though this may lead to more robust results (Petscher, 2010) because of the 
limited number of studies in different developmental phases (e.g., only one older grade used spectro- 
graphic analysis, May, 2014) and because developmental phases may vary by orthographic depth. 

We included a set of moderators based on theory. However, future studies should investigate other 
potential moderators. One example is text difficulty. Although the impact of text difficulty on the 
relation between reading prosody and reading comprehension has been examined (e.g., Benjamin & 
Schwanenflugel, 2010), there was an insufficient number of studies to conduct reliable moderator 
analysis (minimum of 4 effect sizes; Borenstein et al.. 2011). Another example is moderation by 
individual skills such as decoding, oral language, and higher order cognitive skills. As noted above, 
both decoding and semantic skills are needed for reading prosody. Therefore, individuals’ decoding 
and meaning-making ability (e.g., morphological, syntactic, and inferencing ability) might moderate 
the relation between reading prosody and reading comprehension (e.g., see Ravid & Mashraki, 2007). 
Future research should more deeply explore these potential moderators. Additionally, future studies 
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should explore whether the relation between reading prosody and reading comprehension varies by 
features of reading comprehension assessments. Reading comprehension included in the present study 
was measured in a variety of ways, including open-ended, multiple-choice, oral retell, multiple choice 
mixed with open-ended. Some of the reading comprehension assessments (e.g., QRI) assessed the 
same text that was read aloud for the reading prosody measure while others used a different assess- 
ment. Given the multiple assessment formats and mixed format, we could not examine whether the 
relation between reading prosody and reading comprehension differed by assessment format. On 
a related note, majority of studies also did not indicate oral or silent mode of reading comprehension 
assessment, and therefore, we could not examine whether the relation between reading prosody and 
reading comprehension varied by reading mode. 

Another direction in future research is an examination of the relation for students with learning 
disabilities or second language learners as there were very few studies for these populations. For 
example, people with certain disabilities, such as those on the Autism Disorder Spectrum, have been 
found to have atypical prosody (McCann & Peppé, 2003); thus, future studies are warranted on the 
relation of reading prosody and reading comprehension for these populations. 

In conclusion, in the present systematic review and meta-analysis, we found a moderate relation 
between reading prosody and reading comprehension, and differential relations as a function of 
reading prosody features. Another important finding is a critical gap in the literature: There are 
insufficient studies measuring different features of reading prosody (e.g., spectrographic analysis), 
varying orthographic characteristics (e.g., limited shallow orthographies), and reader skills (e.g., 
beyond primary grades). Thus, our understanding of the exact nature of the relation between reading 
prosody and reading comprehension is lacking and future studies are needed. 


Note 


1. Data are available upon request from the first author. 
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